Overview

Dataset statistics

Number of variables14
Number of observations98913
Missing cells0
Missing cells (%)0.0%
Duplicate rows91012
Duplicate rows (%)92.0%
Total size in memory42.8 MiB
Average record size in memory453.6 B

Variable types

Categorical6
Numeric8

Warnings

Users has constant value "Users" Constant
Dataset has 91012 (92.0%) duplicate rows Duplicates
seller is highly correlated with UsersHigh correlation
Social use is highly correlated with UsersHigh correlation
A_user is highly correlated with UsersHigh correlation
Users is highly correlated with seller and 4 other fieldsHigh correlation
language is highly correlated with UsersHigh correlation
Buyer is highly correlated with UsersHigh correlation
socialNbFollowers is highly skewed (γ1 = 88.81691016) Skewed
socialNbFollows is highly skewed (γ1 = 220.8766787) Skewed
socialProductsLiked is highly skewed (γ1 = 244.1577429) Skewed
productsListed is highly skewed (γ1 = 64.89321853) Skewed
productsSold is highly skewed (γ1 = 41.59563253) Skewed
productsWished is highly skewed (γ1 = 49.25695941) Skewed
productsBought is highly skewed (γ1 = 84.79735987) Skewed
socialProductsLiked has 82987 (83.9%) zeros Zeros
productsListed has 97189 (98.3%) zeros Zeros
productsSold has 96877 (97.9%) zeros Zeros
productsPassRate has 97979 (99.1%) zeros Zeros
productsWished has 89612 (90.6%) zeros Zeros
productsBought has 93494 (94.5%) zeros Zeros

Reproduction

Analysis started2021-04-01 10:26:38.412366
Analysis finished2021-04-01 10:28:11.904484
Duration1 minute and 33.49 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

language
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.6 MiB
en
51564 
fr
26372 
it
7766 
de
7178 
es
6033 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters197826
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowen
2nd rowen
3rd rowfr
4th rowen
5th rowen
ValueCountFrequency (%)
en51564
52.1%
fr26372
26.7%
it7766
 
7.9%
de7178
 
7.3%
es6033
 
6.1%
2021-04-01T12:28:12.748717image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-01T12:28:13.140557image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
en51564
52.1%
fr26372
26.7%
it7766
 
7.9%
de7178
 
7.3%
es6033
 
6.1%

Most occurring characters

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter197826
100.0%

Most frequent character per category

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
Latin197826
100.0%

Most frequent character per script

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII197826
100.0%

Most frequent character per block

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

socialNbFollowers
Real number (ℝ≥0)

SKEWED

Distinct90
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.432268761
Minimum3
Maximum744
Zeros0
Zeros (%)0.0%
Memory size772.9 KiB
2021-04-01T12:28:13.730387image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile3
Q13
median3
Q33
95-th percentile5
Maximum744
Range741
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3.882383028
Coefficient of variation (CV)1.131141906
Kurtosis14415.30703
Mean3.432268761
Median Absolute Deviation (MAD)0
Skewness88.81691016
Sum339496
Variance15.07289798
MonotocityNot monotonic
2021-04-01T12:28:14.261672image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
384939
85.9%
48219
 
8.3%
52720
 
2.7%
6813
 
0.8%
7539
 
0.5%
8336
 
0.3%
9235
 
0.2%
10164
 
0.2%
11121
 
0.1%
1299
 
0.1%
Other values (80)728
 
0.7%
ValueCountFrequency (%)
384939
85.9%
48219
 
8.3%
52720
 
2.7%
6813
 
0.8%
7539
 
0.5%
8336
 
0.3%
9235
 
0.2%
10164
 
0.2%
11121
 
0.1%
1299
 
0.1%
ValueCountFrequency (%)
7441
< 0.1%
3531
< 0.1%
2051
< 0.1%
1761
< 0.1%
1721
< 0.1%
1672
< 0.1%
1471
< 0.1%
1371
< 0.1%
1311
< 0.1%
1301
< 0.1%

socialNbFollows
Real number (ℝ≥0)

SKEWED

Distinct85
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.42567711
Minimum0
Maximum13764
Zeros39
Zeros (%)< 0.1%
Memory size772.9 KiB
2021-04-01T12:28:14.819711image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile8
Q18
median8
Q38
95-th percentile8
Maximum13764
Range13764
Interquartile range (IQR)0

Descriptive statistics

Standard deviation52.83957192
Coefficient of variation (CV)6.271255262
Kurtosis52718.3891
Mean8.42567711
Median Absolute Deviation (MAD)0
Skewness220.8766787
Sum833409
Variance2792.02036
MonotocityNot monotonic
2021-04-01T12:28:15.476730image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
894893
95.9%
92386
 
2.4%
10618
 
0.6%
11260
 
0.3%
12148
 
0.1%
1394
 
0.1%
1555
 
0.1%
1453
 
0.1%
752
 
0.1%
039
 
< 0.1%
Other values (75)315
 
0.3%
ValueCountFrequency (%)
039
 
< 0.1%
15
 
< 0.1%
28
 
< 0.1%
36
 
< 0.1%
411
 
< 0.1%
511
 
< 0.1%
67
 
< 0.1%
752
 
0.1%
894893
95.9%
92386
 
2.4%
ValueCountFrequency (%)
137641
< 0.1%
82681
< 0.1%
36491
< 0.1%
20131
< 0.1%
5001
< 0.1%
4821
< 0.1%
4501
< 0.1%
4311
< 0.1%
4211
< 0.1%
2091
< 0.1%

socialProductsLiked
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct420
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.420743482
Minimum0
Maximum51671
Zeros82987
Zeros (%)83.9%
Memory size772.9 KiB
2021-04-01T12:28:16.074956image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile8
Maximum51671
Range51671
Interquartile range (IQR)0

Descriptive statistics

Standard deviation181.0305695
Coefficient of variation (CV)40.95025423
Kurtosis67765.24122
Mean4.420743482
Median Absolute Deviation (MAD)0
Skewness244.1577429
Sum437269
Variance32772.06708
MonotocityNot monotonic
2021-04-01T12:28:16.679370image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
082987
83.9%
15261
 
5.3%
21898
 
1.9%
31215
 
1.2%
4973
 
1.0%
5644
 
0.7%
6532
 
0.5%
7436
 
0.4%
8359
 
0.4%
9316
 
0.3%
Other values (410)4292
 
4.3%
ValueCountFrequency (%)
082987
83.9%
15261
 
5.3%
21898
 
1.9%
31215
 
1.2%
4973
 
1.0%
5644
 
0.7%
6532
 
0.5%
7436
 
0.4%
8359
 
0.4%
9316
 
0.3%
ValueCountFrequency (%)
516711
< 0.1%
160401
< 0.1%
70441
< 0.1%
59791
< 0.1%
55981
< 0.1%
55951
< 0.1%
51091
< 0.1%
30371
< 0.1%
29421
< 0.1%
28231
< 0.1%

productsListed
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct65
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09330421684
Minimum0
Maximum244
Zeros97189
Zeros (%)98.3%
Memory size772.9 KiB
2021-04-01T12:28:17.279472image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum244
Range244
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.050143546
Coefficient of variation (CV)21.97267835
Kurtosis5760.301256
Mean0.09330421684
Median Absolute Deviation (MAD)0
Skewness64.89321853
Sum9229
Variance4.203088557
MonotocityNot monotonic
2021-04-01T12:28:17.868896image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
097189
98.3%
1808
 
0.8%
2278
 
0.3%
3150
 
0.2%
498
 
0.1%
562
 
0.1%
645
 
< 0.1%
740
 
< 0.1%
829
 
< 0.1%
1022
 
< 0.1%
Other values (55)192
 
0.2%
ValueCountFrequency (%)
097189
98.3%
1808
 
0.8%
2278
 
0.3%
3150
 
0.2%
498
 
0.1%
562
 
0.1%
645
 
< 0.1%
740
 
< 0.1%
829
 
< 0.1%
920
 
< 0.1%
ValueCountFrequency (%)
2441
< 0.1%
2171
< 0.1%
2021
< 0.1%
1851
< 0.1%
1231
< 0.1%
1221
< 0.1%
1172
< 0.1%
1131
< 0.1%
1021
< 0.1%
961
< 0.1%

productsSold
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct75
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1215917018
Minimum0
Maximum174
Zeros96877
Zeros (%)97.9%
Memory size772.9 KiB
2021-04-01T12:28:18.455550image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum174
Range174
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.126895354
Coefficient of variation (CV)17.49210943
Kurtosis2355.673441
Mean0.1215917018
Median Absolute Deviation (MAD)0
Skewness41.59563253
Sum12027
Variance4.523683846
MonotocityDecreasing
2021-04-01T12:28:18.981540image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
096877
97.9%
1917
 
0.9%
2325
 
0.3%
3154
 
0.2%
4124
 
0.1%
658
 
0.1%
558
 
0.1%
745
 
< 0.1%
942
 
< 0.1%
831
 
< 0.1%
Other values (65)282
 
0.3%
ValueCountFrequency (%)
096877
97.9%
1917
 
0.9%
2325
 
0.3%
3154
 
0.2%
4124
 
0.1%
558
 
0.1%
658
 
0.1%
745
 
< 0.1%
831
 
< 0.1%
942
 
< 0.1%
ValueCountFrequency (%)
1741
< 0.1%
1701
< 0.1%
1631
< 0.1%
1521
< 0.1%
1251
< 0.1%
1231
< 0.1%
1081
< 0.1%
1061
< 0.1%
1041
< 0.1%
921
< 0.1%

productsPassRate
Real number (ℝ≥0)

ZEROS

Distinct72
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8123027307
Minimum0
Maximum100
Zeros97979
Zeros (%)99.1%
Memory size772.9 KiB
2021-04-01T12:28:20.266306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum100
Range100
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.500205194
Coefficient of variation (CV)10.46433167
Kurtosis114.0391218
Mean0.8123027307
Median Absolute Deviation (MAD)0
Skewness10.66729865
Sum80347.3
Variance72.25348834
MonotocityNot monotonic
2021-04-01T12:28:20.695611image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
097979
99.1%
100441
 
0.4%
6663
 
0.1%
5057
 
0.1%
7542
 
< 0.1%
8325
 
< 0.1%
9025
 
< 0.1%
8022
 
< 0.1%
8520
 
< 0.1%
6016
 
< 0.1%
Other values (62)223
 
0.2%
ValueCountFrequency (%)
097979
99.1%
255
 
< 0.1%
282
 
< 0.1%
311
 
< 0.1%
338
 
< 0.1%
351
 
< 0.1%
372
 
< 0.1%
402
 
< 0.1%
41.61
 
< 0.1%
421
 
< 0.1%
ValueCountFrequency (%)
100441
0.4%
991
 
< 0.1%
98.71
 
< 0.1%
988
 
< 0.1%
96.41
 
< 0.1%
96.21
 
< 0.1%
965
 
< 0.1%
955
 
< 0.1%
948
 
< 0.1%
9312
 
< 0.1%

productsWished
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct279
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.562595412
Minimum0
Maximum2635
Zeros89612
Zeros (%)90.6%
Memory size772.9 KiB
2021-04-01T12:28:21.102731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum2635
Range2635
Interquartile range (IQR)0

Descriptive statistics

Standard deviation25.19279323
Coefficient of variation (CV)16.12240317
Kurtosis3369.163069
Mean1.562595412
Median Absolute Deviation (MAD)0
Skewness49.25695941
Sum154561
Variance634.6768308
MonotocityNot monotonic
2021-04-01T12:28:21.518205image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
089612
90.6%
13375
 
3.4%
21339
 
1.4%
3797
 
0.8%
4526
 
0.5%
5406
 
0.4%
6299
 
0.3%
7252
 
0.3%
8176
 
0.2%
9158
 
0.2%
Other values (269)1973
 
2.0%
ValueCountFrequency (%)
089612
90.6%
13375
 
3.4%
21339
 
1.4%
3797
 
0.8%
4526
 
0.5%
5406
 
0.4%
6299
 
0.3%
7252
 
0.3%
8176
 
0.2%
9158
 
0.2%
ValueCountFrequency (%)
26351
< 0.1%
19161
< 0.1%
19001
< 0.1%
18421
< 0.1%
18201
< 0.1%
17831
< 0.1%
16221
< 0.1%
12951
< 0.1%
12251
< 0.1%
11131
< 0.1%

productsBought
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct70
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1719288668
Minimum0
Maximum405
Zeros93494
Zeros (%)94.5%
Memory size772.9 KiB
2021-04-01T12:28:21.957365image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum405
Range405
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.332265666
Coefficient of variation (CV)13.56529424
Kurtosis11871.75975
Mean0.1719288668
Median Absolute Deviation (MAD)0
Skewness84.79735987
Sum17006
Variance5.439463136
MonotocityNot monotonic
2021-04-01T12:28:22.377242image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
093494
94.5%
13297
 
3.3%
2845
 
0.9%
3364
 
0.4%
4214
 
0.2%
5139
 
0.1%
6108
 
0.1%
765
 
0.1%
852
 
0.1%
940
 
< 0.1%
Other values (60)295
 
0.3%
ValueCountFrequency (%)
093494
94.5%
13297
 
3.3%
2845
 
0.9%
3364
 
0.4%
4214
 
0.2%
5139
 
0.1%
6108
 
0.1%
765
 
0.1%
852
 
0.1%
940
 
< 0.1%
ValueCountFrequency (%)
4051
< 0.1%
2791
< 0.1%
1741
< 0.1%
1151
< 0.1%
1051
< 0.1%
931
< 0.1%
871
< 0.1%
851
< 0.1%
811
< 0.1%
801
< 0.1%

A_user
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.1 MiB
inactive
77274 
active
21639 

Length

Max length8
Median length8
Mean length7.562463984
Min length6

Characters and Unicode

Total characters748026
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowactive
2nd rowactive
3rd rowactive
4th rowactive
5th rowactive
ValueCountFrequency (%)
inactive77274
78.1%
active21639
 
21.9%
2021-04-01T12:28:23.554358image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-01T12:28:23.908277image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
inactive77274
78.1%
active21639
 
21.9%

Most occurring characters

ValueCountFrequency (%)
i176187
23.6%
a98913
13.2%
c98913
13.2%
t98913
13.2%
v98913
13.2%
e98913
13.2%
n77274
10.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter748026
100.0%

Most frequent character per category

ValueCountFrequency (%)
i176187
23.6%
a98913
13.2%
c98913
13.2%
t98913
13.2%
v98913
13.2%
e98913
13.2%
n77274
10.3%

Most occurring scripts

ValueCountFrequency (%)
Latin748026
100.0%

Most frequent character per script

ValueCountFrequency (%)
i176187
23.6%
a98913
13.2%
c98913
13.2%
t98913
13.2%
v98913
13.2%
e98913
13.2%
n77274
10.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII748026
100.0%

Most frequent character per block

ValueCountFrequency (%)
i176187
23.6%
a98913
13.2%
c98913
13.2%
t98913
13.2%
v98913
13.2%
e98913
13.2%
n77274
10.3%

Social use
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
non social user
83318 
social user
15595 

Length

Max length15
Median length15
Mean length14.36934478
Min length11

Characters and Unicode

Total characters1421315
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsocial user
2nd rowsocial user
3rd rowsocial user
4th rowsocial user
5th rowsocial user
ValueCountFrequency (%)
non social user83318
84.2%
social user15595
 
15.8%
2021-04-01T12:28:24.766229image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-01T12:28:25.298718image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
user98913
35.2%
social98913
35.2%
non83318
29.6%

Most occurring characters

ValueCountFrequency (%)
s197826
13.9%
o182231
12.8%
182231
12.8%
n166636
11.7%
c98913
7.0%
i98913
7.0%
a98913
7.0%
l98913
7.0%
u98913
7.0%
e98913
7.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1239084
87.2%
Space Separator182231
 
12.8%

Most frequent character per category

ValueCountFrequency (%)
s197826
16.0%
o182231
14.7%
n166636
13.4%
c98913
8.0%
i98913
8.0%
a98913
8.0%
l98913
8.0%
u98913
8.0%
e98913
8.0%
r98913
8.0%
ValueCountFrequency (%)
182231
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1239084
87.2%
Common182231
 
12.8%

Most frequent character per script

ValueCountFrequency (%)
s197826
16.0%
o182231
14.7%
n166636
13.4%
c98913
8.0%
i98913
8.0%
a98913
8.0%
l98913
8.0%
u98913
8.0%
e98913
8.0%
r98913
8.0%
ValueCountFrequency (%)
182231
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1421315
100.0%

Most frequent character per block

ValueCountFrequency (%)
s197826
13.9%
o182231
12.8%
182231
12.8%
n166636
11.7%
c98913
7.0%
i98913
7.0%
a98913
7.0%
l98913
7.0%
u98913
7.0%
e98913
7.0%

Buyer
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.2 MiB
Non buyer
93494 
Buyer
 
5419

Length

Max length9
Median length9
Mean length8.780857926
Min length5

Characters and Unicode

Total characters868541
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBuyer
2nd rowNon buyer
3rd rowBuyer
4th rowNon buyer
5th rowNon buyer
ValueCountFrequency (%)
Non buyer93494
94.5%
Buyer5419
 
5.5%
2021-04-01T12:28:26.568468image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-01T12:28:26.856488image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
buyer98913
51.4%
non93494
48.6%

Most occurring characters

ValueCountFrequency (%)
u98913
11.4%
y98913
11.4%
e98913
11.4%
r98913
11.4%
N93494
10.8%
o93494
10.8%
n93494
10.8%
93494
10.8%
b93494
10.8%
B5419
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter676134
77.8%
Uppercase Letter98913
 
11.4%
Space Separator93494
 
10.8%

Most frequent character per category

ValueCountFrequency (%)
u98913
14.6%
y98913
14.6%
e98913
14.6%
r98913
14.6%
o93494
13.8%
n93494
13.8%
b93494
13.8%
ValueCountFrequency (%)
N93494
94.5%
B5419
 
5.5%
ValueCountFrequency (%)
93494
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin775047
89.2%
Common93494
 
10.8%

Most frequent character per script

ValueCountFrequency (%)
u98913
12.8%
y98913
12.8%
e98913
12.8%
r98913
12.8%
N93494
12.1%
o93494
12.1%
n93494
12.1%
b93494
12.1%
B5419
 
0.7%
ValueCountFrequency (%)
93494
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII868541
100.0%

Most frequent character per block

ValueCountFrequency (%)
u98913
11.4%
y98913
11.4%
e98913
11.4%
r98913
11.4%
N93494
10.8%
o93494
10.8%
n93494
10.8%
93494
10.8%
b93494
10.8%
B5419
 
0.6%

seller
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.3 MiB
Non seller
96877 
seller
 
2036

Length

Max length10
Median length10
Mean length9.917665019
Min length6

Characters and Unicode

Total characters980986
Distinct characters8
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowseller
2nd rowseller
3rd rowseller
4th rowseller
5th rowseller
ValueCountFrequency (%)
Non seller96877
97.9%
seller2036
 
2.1%
2021-04-01T12:28:27.558812image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-01T12:28:27.850226image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
seller98913
50.5%
non96877
49.5%

Most occurring characters

ValueCountFrequency (%)
e197826
20.2%
l197826
20.2%
s98913
10.1%
r98913
10.1%
N96877
9.9%
o96877
9.9%
n96877
9.9%
96877
9.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter787232
80.2%
Uppercase Letter96877
 
9.9%
Space Separator96877
 
9.9%

Most frequent character per category

ValueCountFrequency (%)
e197826
25.1%
l197826
25.1%
s98913
12.6%
r98913
12.6%
o96877
12.3%
n96877
12.3%
ValueCountFrequency (%)
N96877
100.0%
ValueCountFrequency (%)
96877
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin884109
90.1%
Common96877
 
9.9%

Most frequent character per script

ValueCountFrequency (%)
e197826
22.4%
l197826
22.4%
s98913
11.2%
r98913
11.2%
N96877
11.0%
o96877
11.0%
n96877
11.0%
ValueCountFrequency (%)
96877
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII980986
100.0%

Most frequent character per block

ValueCountFrequency (%)
e197826
20.2%
l197826
20.2%
s98913
10.1%
r98913
10.1%
N96877
9.9%
o96877
9.9%
n96877
9.9%
96877
9.9%

Users
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.8 MiB
Users
98913 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters494565
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUsers
2nd rowUsers
3rd rowUsers
4th rowUsers
5th rowUsers
ValueCountFrequency (%)
Users98913
100.0%
2021-04-01T12:28:28.490265image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-01T12:28:28.776289image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
users98913
100.0%

Most occurring characters

ValueCountFrequency (%)
s197826
40.0%
U98913
20.0%
e98913
20.0%
r98913
20.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter395652
80.0%
Uppercase Letter98913
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
s197826
50.0%
e98913
25.0%
r98913
25.0%
ValueCountFrequency (%)
U98913
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin494565
100.0%

Most frequent character per script

ValueCountFrequency (%)
s197826
40.0%
U98913
20.0%
e98913
20.0%
r98913
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII494565
100.0%

Most frequent character per block

ValueCountFrequency (%)
s197826
40.0%
U98913
20.0%
e98913
20.0%
r98913
20.0%

Interactions

2021-04-01T12:27:16.425244image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:16.899026image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:17.404103image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:18.237879image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:19.056957image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:20.228019image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:20.987365image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:22.141343image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:23.211569image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:24.077386image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:25.144358image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:25.821241image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:26.400638image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:27.424732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:28.297864image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:29.231301image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:30.410468image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:31.690450image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:32.346528image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:33.131686image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:33.915385image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:34.798232image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:37.465454image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:38.304189image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:38.815035image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:39.408559image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:40.010676image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:40.665136image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:41.715472image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:42.820596image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:43.616892image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:44.668759image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:46.156473image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:47.131178image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:48.316234image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:49.413035image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:49.933893image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:50.613529image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:51.182619image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:51.670332image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:52.233702image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:52.707309image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:53.216475image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:54.259689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:54.784550image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:55.802765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:56.505326image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:57.879753image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:58.629730image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:27:59.439024image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:28:00.171321image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:28:00.864452image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:28:01.991284image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:28:02.572585image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:28:03.179377image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-01T12:28:03.745687image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-04-01T12:28:29.004613image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-01T12:28:29.726790image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-01T12:28:30.441672image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-01T12:28:31.265532image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-04-01T12:28:32.337374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-04-01T12:28:05.262925image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-01T12:28:07.561413image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

languagesocialNbFollowerssocialNbFollowssocialProductsLikedproductsListedproductsSoldproductsPassRateproductsWishedproductsBoughtA_userSocial useBuyersellerUsers
0en14710772617474.01041activesocial userBuyersellerUsers
1en167821917099.000activesocial userNon buyersellerUsers
2fr13713603316394.0103activesocial userBuyersellerUsers
3en131101412215292.070activesocial userNon buyersellerUsers
4en1678025125100.000activesocial userNon buyersellerUsers
5de1301214712391.000activesocial userNon buyersellerUsers
6en121011403110894.0531105activesocial userBuyersellerUsers
7fr5393510698.000activesocial userNon buyersellerUsers
8it7441376451671010485.018420activesocial userNon buyersellerUsers
9en578451239274.062activesocial userBuyersellerUsers

Last rows

languagesocialNbFollowerssocialNbFollowssocialProductsLikedproductsListedproductsSoldproductsPassRateproductsWishedproductsBoughtA_userSocial useBuyersellerUsers
98903es380000.000inactivenon social userNon buyerNon sellerUsers
98904en380000.000inactivenon social userNon buyerNon sellerUsers
98905en386000.000activenon social userNon buyerNon sellerUsers
98906en380000.000inactivenon social userNon buyerNon sellerUsers
98907en380000.000inactivenon social userNon buyerNon sellerUsers
98908fr380000.000inactivenon social userNon buyerNon sellerUsers
98909fr380000.000inactivenon social userNon buyerNon sellerUsers
98910en380000.000inactivenon social userNon buyerNon sellerUsers
98911it380000.000inactivenon social userNon buyerNon sellerUsers
98912fr380000.000inactivenon social userNon buyerNon sellerUsers

Duplicate rows

Most frequent

languagesocialNbFollowerssocialNbFollowssocialProductsLikedproductsListedproductsSoldproductsPassRateproductsWishedproductsBoughtA_userSocial useBuyersellerUserscount
144en380000.000inactivenon social userNon buyerNon sellerUsers36601
817fr380000.000inactivenon social userNon buyerNon sellerUsers18896
1135it380000.000inactivenon social userNon buyerNon sellerUsers5266
0de380000.000inactivenon social userNon buyerNon sellerUsers4690
734es380000.000inactivenon social userNon buyerNon sellerUsers4527
488en480000.000inactivesocial userNon buyerNon sellerUsers2576
190en381000.000activenon social userNon buyerNon sellerUsers1552
998fr480000.000inactivesocial userNon buyerNon sellerUsers1534
856fr381000.000activenon social userNon buyerNon sellerUsers739
151en380000.010activenon social userNon buyerNon sellerUsers649